Module 1: Participation 2

Group Paper Presentation

Published

November 21, 2024

Modified

February 17, 2026

Overview

In this participation assignment, you will work in groups to study, synthesize, and present a foundational or state-of-the-art research paper in Generative AI, Large Language Models (LLMs), or reproducible computational practice.

The goal is not to reproduce the paper technically, but to:

  • Understand why the paper mattered
  • Explain what problem it solved
  • Situate it within the modern GenAI stack
  • Critically assess its assumptions, limitations, and downstream implications

Each group will deliver a short academic-style presentation aimed at a technically literate but non-specialist audience (e.g., analytics managers, graduate students, applied researchers).

This assignment emphasizes:

  • Conceptual clarity
  • Systems thinking
  • Research literacy
  • Responsible AI awareness

Goals

By completing this assignment, you will:

  • Develop the ability to read and interpret AI research papers
  • Learn how modern GenAI systems evolved from earlier computational ideas
  • Practice explaining complex ideas clearly and precisely
  • Engage critically with reproducibility, scale, alignment, and responsibility
  • Strengthen academic and professional presentation skills

1 Paper Selection (One Paper per Group)

  • Everyone must read through the curated list below.
  • Papers are organized by theme.
  • Each group will need to present a presentation on one of the papers from the week’s theme listed under numbered list.
  • Names of the papers are listed in the table and are also available at the end of this document.
  • You can also search these papers on scholar.google.com.

1.1 Open science & reproducibility (classic but still essential)

Reading instruction: Read these, no presentation.

  • Wilson et al. (2014) Best Practices for Scientific Computing
  • Peng (2011) Reproducible Research in Computational Science
  • Stodden et al. (2014) The Practice of Reproducible Research
  • Stodden et al. (2016) Computational Reproducibility
  • Knuth (1984) *Literate Programming

1.2 Foundations Readings in Language Representation

Reading instruction: Read these, no presentation.

Neural Foundations Distributional Semantics Embeddings in Practice
A Neural Probabilistic Language Model (Bengio et al. 2003) Vector Space Models of Semantics (Turney and Pantel 2010) Distributed Representations of Words (Mikolov, Sutskever, et al. 2013)
Sequence to sequence learning (Sutskever et al. 2014) Are LLMs Models of Distributional Semantics? A Case Study on Quantifiers (Enyan et al. 2024) Efficient Estimation of Word Representations in Vector Space (Mikolov, Chen, et al. 2013)

1.3 Group 1 – Transformers: Architecture and Attention

Transformer Core Interpretability & Attention Representational Power
Attention Is All You Need (Vaswani et al. 2017) What Does BERT Look At? (Clark et al. 2019) On the Turing Completeness of Modern Neural Network Architectures (Pérez et al. 2019)
Sequence to Sequence Learning (Sutskever et al. 2014) Quantifying attention flow in transformers (Abnar and Zuidema 2020) Are Transformers Universal Approximators? (Yun et al. 2019)
Self-Attention with Relative Position (Shaw et al. 2018) Attention Is Not Explanation (Jain and Wallace 2019) Efficient streaming language models(Xiao et al. 2023)

1.4 Group 2 – Scaling & Emergence

2 Presentation Expectations

2.1 Presentation Length

  • 10–12 minutes total
  • 5–7 slides
  • All group members must participate

2.2 Suggested Slide Structure

Your presentation must include:

  1. Problem Framing

    • What problem did this paper address?
    • Why was it important at the time?
  2. Core Contribution

    • Key idea, model, framework, or insight
    • What changed because of this paper?
  3. Technical Intuition (Not Math-Heavy)

    • Diagrams encouraged
    • Focus on system logic, not equations
  4. Impact and Legacy

    • How does this paper influence modern GenAI systems?
    • Where do we see it today?
  5. Limitations and Critique

    • What does the paper not address?
    • What assumptions may no longer hold?
  6. Relevance to This Course

    • How does this connect to:

      • Prompting
      • RAG
      • Fine-tuning
      • Reproducibility
      • Responsible AI

3 Submission Requirements

  1. Presentation Slides (PDF)
  2. One-page Paper Brief (PDF) including:
    • Paper citation
    • Key contribution (≤150 words)
    • One critique
    • One open research question

3.1 Hints and Best Practices

  • Focus on ideas, not implementation details
  • Assume your audience understands Python and ML basics
  • Use diagrams over equations
  • Avoid reading slides verbatim
  • Practice explaining the paper without jargon

This assignment is designed to help you think like a researcher and systems designer, not just a tool user. Choose wisely, read deeply, and present with clarity.

References

Abnar, S., and Zuidema, W. 2020. “Quantifying Attention Flow in Transformers,” arXiv Preprint arXiv:2005.00928.
Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. 2003. “A Neural Probabilistic Language Model,” Journal of Machine Learning Research (3), pp. 1137–1155. (https://www.jmlr.org/papers/v3/bengio03a.html).
Bubeck, S., Chandrasekaran, V., Eldan, R., and others. 2023. “Sparks of Artificial General Intelligence: Early Experiments with GPT-4,” arXiv Preprint arXiv:2303.12712. (https://arxiv.org/abs/2303.12712).
Clark, K., Khandelwal, U., Levy, O., and Manning, C. D. 2019. “What Does BERT Look at? An Analysis of BERT’s Attention,” arXiv Preprint arXiv:1906.04341.
Enyan, Z., Wang, Z., Lepori, M. A., Pavlick, E., and Aparicio, H. 2024. Are LLMs Models of Distributional Semantics? A Case Study on Quantifiers. (http://arxiv.org/abs/2410.13984).
Jain, S., and Wallace, B. C. 2019. “Attention Is Not Explanation,” arXiv Preprint arXiv:1902.10186.
Kaplan, J., McCandlish, S., Henighan, T., and others. 2020. “Scaling Laws for Neural Language Models,” arXiv Preprint arXiv:2001.08361. (https://arxiv.org/abs/2001.08361).
Mikolov, T., Chen, K., Corrado, G., and Dean, J. 2013. “Efficient Estimation of Word Representations in Vector Space,” 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings, International Conference on Learning Representations, ICLR. (http://arxiv.org/abs/1301.3781).
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. 2013. Distributed Representations of Words and Phrases and Their Compositionality, (26). (https://arxiv.org/abs/1310.4546).
Pérez, J., Marinković, J., and Barceló, P. 2019. “On the Turing Completeness of Modern Neural Network Architectures,” arXiv Preprint arXiv:1901.03429.
Shaw, P., Uszkoreit, J., and Vaswani, A. 2018. “Self-Attention with Relative Position Representations,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) (Vol. 2), Association for Computational Linguistics, March, pp. 464–468. (https://doi.org/10.18653/v1/N18-2074).
Sutskever, I., Vinyals, O., and Le, Q. V. 2014. “Sequence to Sequence Learning with Neural Networks,” in Advances in Neural Information Processing Systems (Vol. 27), Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger (eds.), Curran Associates, Inc. (https://proceedings.neurips.cc/paper_files/paper/2014/file/5a18e133cbf9f257297f410bb7eca942-Paper.pdf).
Turney, P. D., and Pantel, P. 2010. “From Frequency to Meaning: Vector Space Models of Semantics,” Journal of Artificial Intelligence Research (37), pp. 141–188. (https://doi.org/10.1613/jair.2934).
Vaswani, A., Shazeer, N., Parmar, N., and others. 2017. Attention Is All You Need, (30). (https://arxiv.org/abs/1706.03762).
Xiao, G., Tian, Y., Chen, B., Han, S., and Lewis, M. 2023. “Efficient Streaming Language Models with Attention Sinks,” arXiv Preprint arXiv:2309.17453.
Yun, C., Bhojanapalli, S., Rawat, A. S., Reddi, S. J., and Kumar, S. 2019. “Are Transformers Universal Approximators of Sequence-to-Sequence Functions?” arXiv Preprint arXiv:1912.10077.